DATA1220-55, Fall 2024
2024-09-20
Instructions (homework2_instructions.pdf), a Quarto markdown template (homework2_template.qmd), and an example HTML output (homework2_example.html) are available for download under Chapter 2 on the Modules page in Canvas.
Video walk-through of Homework 2 under Tutorials on the Modules page in Canvas. Make sure you’re caught up on the video walk-through of homework 1.
Upload TWO (2) documents to Homework 2 on the Assignments page in Canvas by Friday 9/20/2024 by 6:00pm: homework2_yourlastname.qmd and homework2_yourlastname.html
Read the instructions! Some of the issues you’re having are because you did not follow them correctly.
Please answer in complete sentences where possible! I want you to practice effectively communicating data, and life is not a multiple choice question. I will be more clear about indicating this on future homework.
Real world distributions are harder to describe than idealized theoretical distributions. Combining visual and numeric summaries is more powerful than using either alone.
Turn on notifications. Your question may have already been asked and answered. Campuswire can email you when there are new posts, so you can keep up with the discussion.
Be specific! A detailed question is more likely to get a (useful) answer than a general question.
Include code & error messages. It is much easier to troubleshoot “My document won’t render. I’ve copy-pasted the error message and the lines of code where it breaks.” than “My document won’t render.” Click here for more info on how to ask good debugging questions.
Read the textbook. Many of you are asking for additional examples. Luckily, there are tons we didn’t go over in the textbook.
Ask a question on our Campuswire class feed. I’m only one person, and I may not be able to give you a prompt answer. However, the 20+ other people in the class might be able to.
I will try to keep an eye on Campuswire posts between 4-6pm before the homework is due, but I have other things going on and might miss something.
Probability: The proportion of times that a particular outcome would occur if we observed a random process an infinite number of times (\(\operatorname{P}(\operatorname{Event = A})\).
Ranges from 0 to 1 or 0% to 100%
\(0 \le \operatorname{probability} \le 1\)
Probability = Proportion
Random process: you know which outcomes are possible (i.e. the sample space) but you don’t know which outcome comes next
Sample space: all possible outcomes of a random process (\(S\))
Disjoint events: events that CANNOT occur at the same time (mutually exclusive)
Complement: the complement of any event \(A\) which exists in sample space \(S\) is any outcome also in sample space \(S\) which is NOT \(A\) (\(A^C\) or \(A'\))
Complements are always disjoint
The probability of event A occuring OR the complement of event A occuring is always 1
Non-disjoint events: events that CAN occur at the same time
Remember…
\(\operatorname{P}(S)=1\)
\(\operatorname{P}(S)=\operatorname{P}(A)+\operatorname{P}(A')\)
\(\operatorname{P}(A)+\operatorname{P}(A')=1\)
\(\operatorname{P}(A')=1-\operatorname{P}(A)\)
\[ p=\frac{\operatorname{count}(\operatorname{events = A})}{\operatorname{count}(\operatorname{all events in sample space})} \]
\[ \hat{p}_n=\frac{\operatorname{count}(\operatorname{observation = A})}{\operatorname{count}(\operatorname{observations in sample})} \]
How well the sample proportion \(\hat{p}_n\) represents the population proportion \(p\) depends on the size of the denominator.
As more observations are collected, the sample proportion \(\hat{p}_n\) of a particular outcome approaches the population proportion \(p\) of that outcome.
\(\lim_{n\to\infty} \hat{p}_n = p\) (As \(n \to \infty\), \(\hat{p}_n \to p\))
Sampling with replacement is like drawing a card from a deck, then shuffling it back in before drawing another card. Repetition is possible.
Sampling WITHOUT replacement
Independent and dependent processes
Calculating probabilities for 2 events
(General) Addition Rule
(General) Multiplication Rule
Probability distributions
Extracted song names and artists from the “Taylor Swift Radio” playlist on Spotify
There are 50 songs on the playlist by 26 different artists.
Some artists have more than 1 song on the playlist, and 1 song was a collaboration between 2 artists.
The iPod Shuffle originally used random sampling WITH replacement to select the next song to play (“true” shuffle)
Spotify originally used random sampling WITHOUT replacement to select the next song to play (Fisher-Yates Algorithm)
See board…
See board…
\[ \begin{aligned} \operatorname{P}(\operatorname{Next Song by Chappell Roan})&=\frac{\operatorname{count}(\operatorname{Songs by Chappell Roan})}{\operatorname{count}(\operatorname{All Possible Songs})} \\ &= \frac{7}{50} \\ &= 0.14 \end{aligned} \]
See board…
Describes the probability of event A or event B occurring (\(P(A \cup B)\))
\[\operatorname{P}(\operatorname{A or B}) = \operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\]
See board…
Describes the probability of event A or event B occurring
\[\operatorname{P}(\operatorname{A or B}) = \operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B}) - \operatorname{P}(\operatorname{A and B})\]
See board…
Two random processes are independent if the outcome of process A provides no information about process B
You roll a die once and get a 3. You still don’t know what number you’ll roll next.
You aren’t more likely to land on heads when flipping a coin just because you also got heads on your last one.
What if we listened to 2 songs in a row using “true” shuffle (i.e. sampling with replacement)?
Each song goes back into the “pot” after it is played and can be repeated.
What happens to the sample space after the first song?
The sample space does not change between events.
See board…
7 opportunities for song 1 to be by Chappell Roan
7 opportunities for song 2 to be by chappell roan
\(7 \times 7 = 49 \operatorname{possibilities!}\)
50 possible songs for song 1
50 possible songs for song 2
\(50 \times 50 = 2500 \operatorname{possibilities!}\)
\(\operatorname{P}(\operatorname{A and B})=\frac{7 \times 7}{50 \times 50}\)
Describes the probability of event A and event B occurring
\[ \operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B}) \]
Two random processes are dependent if the probability of process B changes based on the outcome of process A
You’re playing poker and cards are about to be dealt. The probability that you will receive an Ace changes as each card is distributed.
The chances that you’ll bring an umbrella with you when you leave the house changes depending on whether or not its raining.
Sampling without replacement is like drawing a card from a deck, then drawing another card without putting the first one back. Repetition is NOT possible.
What if we listened to 2 songs in a row using Spotify’s shuffle (i.e. sampling without replacement)
Song does NOT go back into the “pot” after it is played and CANNOT be repeated.
The sample space does change between events.
See board…
7 opportunities for song 1 to be by Chappell Roan
6 opportunities for song 2 to be by chappell roan
\(7 \times 6 = 42 \operatorname{possibilities!}\)
50 possible songs for song 1
49 possible songs for song 2
\(50 \times 49 = 2450 \operatorname{possibilities!}\)
\(\operatorname{P}(\operatorname{A and B})=\frac{7 \times 6}{50 \times 49}\)
Describes the probability of event B occurring given that event A has already occurred
\[ \operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B given A}) \] ## Distinguishing Independent and Dependent Processes
When processes are independent, \(\operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\).
If \(\operatorname{P}(\operatorname{A and B})\ne\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\), the processes are NOT independent.
Probability Mass Function: categorical
Probabiity Density Function: numerical
DATA1220-55 Fall 2024, Class 10 | Updated: 2024-09-20 | Canvas | Campuswire